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METHOD AND APPARATUS FOR HIGH RESOLUTION 
SPEECH RECONSTRUCTION 

BACKGROUND OF THE INVENTION 
The present invention relates to speech 
5 processing. In particular, the present invention 
relates to speech enhancement. 

In speech recognition, it is common to 
condition the speech signal to remove noise and 
portions of the speech signal that are not helpful in 

10 decoding the speech into text. For example, it is 
common to apply a frequency-based transform to the 
speech signal to reduce certain frequencies in the 
signal that do not aid in decoding the speech signal. 
One common frequency-based transform is known as a 

15 Mel-Scale transform that reduces pitch harmonics in 
the speech signal. Mel-Scale transforms are used 
because the pitch at which someone speaks does not 
affect the listener's ability to discern what is 
being said. By removing these harmonics, smaller 

20 speech models can be constructed because they do not 
have to be trained to decode speech at different 
pitches. Instead, the Mel-scale transform creates 
pitch- independent models that can be used to decode 
speech of any pitch. 

25 Speech systems also attempt to enhance the 

speech signal by removing noise before performing 
speech recognition. Under some systems, this is done 
in the time domain by applying a noise filter to the 
speech signal. In other systems, this enhancement is 
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performed using a two-stage process in which the 
pitch of the speech is first tracked using a pitch 
tracker and then the pitch is used to separate the 
speech signal from the noise. For various reasons, 
5 such two-stage processing is undesirable. 

A third system for removing noise from a 
speech signal attempted to identify a clean speech 
signal in a noisy signal using a probabilistic 
framework that provided a Minimum Mean Square Error 

10 (MMSE) estimate of the clean signal given a noisy 
signal. This system was designed for speech 

recognition and as such relied on feature vectors 
that were appropriate for speech recognition. In 
particular, this probabilistic system used speech 

15 vectors that were produced using the Mel -scale 
transform. 

Although this probabilistic system did not 
require two-stage processing, it was less than ideal 
for speech enhancement because the Mel -Scale 

20 transform removed information from the signal. 
Because of this loss of information, it is extremely 
difficult, if not impossible, to reconstruct a speech 
signal from the "cleaned" signal that humans can 
easily understand. 

25 Thus, the current systems for enhancing 

speech are less than ideal since they either require 
a two-stage process or make it impossible to 
reconstruct a clean intelligible speech signal. 
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SUMMARY OF THE INVENTION 
A method and apparatus identify a clean 
speech signal from a noisy speech signal. The noisy 
speech signal is converted into frequency values in 
5 the frequency domain. The parameters of at least one 
posterior probability of at least one component of a 
clean signal value are then determined based on the 
frequency values. This determination is made without 
applying a frequency-based filter to the frequency 
10 values. The parameters of the posterior probability 
distribution are then used to estimate a set of 
frequency values for the clean speech signal. 



15 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a general 

computing environment in which the present invention 

may be practiced. 

FIG. 2 is a block diagram of a mobile 

2 0 device in which the present invention may be 

practiced. 

FIG. 3 is a block diagram of a speech 
enhancement system under one embodiment of the 
present invention . 
25 FIG. 4 is a flow diagram of a speech 

enhancement method under one embodiment of the 
present invention . 

FIG. 5 is a flow diagram for determining a 
posterior probability of a clean signal given a noisy 

3 0 signal under one embodiment of the present invention. 
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DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

FIG. 1 illustrates an example of a suitable 
computing system environment 100 on which the 
invention may be implemented. The computing system 
5 environment 10 0 is only one example of a suitable 
computing environment and is not intended to suggest 
any limitation as to the scope of use or 
functionality of the invention. Neither should the 
computing environment 100 be interpreted as having 

10 any dependency or requirement relating to any one or 
combination of components illustrated in the 
exemplary operating environment 100. 

The invention is operational with numerous 
other general purpose or special purpose computing 

15 system environments or configurations. Examples of 
well-known computing systems, environments, and/or 
configurations that may be suitable for use with the 
invention include, but are not limited to, personal 
computers, server computers, hand-held or laptop 

20 devices, multiprocessor systems, microprocessor-based 
systems, set top boxes, programmable consumer 
electronics, network PCs, minicomputers, mainframe 
computers, telephony systems, distributed computing 
environments that include any of the above systems or 

25 devices, and the like. 

The invention may be described in the 
general context of computer-executable instructions, 
such as program modules, being executed by a 
computer. Generally, program modules include 

3 0 routines, programs, objects, components, data 
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structures, etc. that perform particular tasks or 
implement particular abstract data types. The 
invention is designed to be practiced in distributed 
computing environments where tasks are performed by 
5 remote processing devices that are linked through a 
communications network. In a distributed computing 
environment, program modules are located in both 
local and remote computer storage media including 
memory storage devices. 

10 With reference to FIG. 1, an exemplary 

system for implementing the invention includes a 
general -purpose computing device in the form of a 
computer 110. Components of computer 110 may 

include, but are not limited to, a processing unit 

15 120, a system memory 130, and a system bus 121 that 
couples various system components including the 
system memory to the processing unit 120. The system 
bus 121 may be any of several types of bus structures 
including a memory bus or memory controller, a 

20 peripheral bus, and a local bus using any of a 
variety of bus architectures. By way of example, and 
not limitation, such architectures include Industry 
Standard Architecture (ISA) bus, Micro Channel 
Architecture (MCA) bus, Enhanced ISA (EISA) bus, 

25 Video Electronics Standards Association (VESA) local 
bus, and Peripheral Component Interconnect (PCI) bus 
also known as Mezzanine bus. 

Computer 110 typically includes a variety 
of computer readable media. Computer readable media 

3 0 can be any available media that can be accessed by 
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computer 110 and includes both volatile and 
nonvolatile media, removable and non-removable media. 
By way of example, and not limitation, computer 
readable media may comprise computer storage media 
5 and communication media. Computer storage media 
includes both volatile and nonvolatile, removable and 
non-removable media implemented in any method or 
technology for storage of information such as 
computer readable instructions, data structures, 

10 program modules or other data. Computer storage 
media includes, but is not limited to, RAM, ROM, 
EEPROM, flash memory or other memory technology, CD- 
ROM, digital versatile disks (DVD) or other optical 
disk storage, magnetic cassettes, magnetic tape, 

15 magnetic disk storage or other magnetic storage 
devices, or any other medium which can be used to 
store the desired information and which can be 
accessed by computer 110. Communication media 

typically embodies computer readable instructions, 

20 data structures, program modules or other data in a 
modulated data signal such as a carrier wave or other 
transport mechanism and includes any information 
delivery media. The term "modulated data signal" 
means a signal that has one or more of its 

25 characteristics set or changed in such a manner as to 
encode information in the signal. By way of example, 
and not limitation, communication media includes 
wired media such as a wired network or direct -wired 
connection, and wireless media such as acoustic, RF, 

3 0 infrared and other wireless media. Combinations of 



-7- 

any of the above should also be included within the 
scope of computer readable media. 

The system memory 130 includes computer 
storage media in the form of volatile and/or 
5 nonvolatile memory such as read only memory (ROM) 131 
and random access memory (RAM) 132. A basic 

input/output system 133 (BIOS) , containing the basic 
routines that help to transfer information between 
elements within computer 110, such as during start - 

10 up, is typically stored in ROM 131. RAM 132 
typically contains data and/or program modules that 
are immediately accessible to and/or presently being 
operated on by processing unit 12 0. By way of 
example, and not limitation, FIG. 1 illustrates 

15 operating system 134, application programs 135, other 
program modules 136, and program data 137. 

The computer 110 may also include other 
removable/non- removable volatile/nonvolatile computer 
storage media. By way of example only, FIG. 1 

20 illustrates a hard disk drive 141 that reads from or 
writes to non- removable, nonvolatile magnetic media, 
a magnetic disk drive 151 that reads from or writes 
to a removable, nonvolatile magnetic disk 152, and an 
optical disk drive 155 that reads from or writes to a 

25 removable, nonvolatile optical disk 156 such as a CD 
ROM or other optical media. Other removable /non- 
removable, volatile/nonvolatile computer storage 
media that can be used in the exemplary operating 
environment include, but are not limited to, magnetic 

30 tape cassettes, flash memory cards, digital versatile 



-8- 

disks, digital video tape, solid state RAM, solid 
state ROM, and the like. The hard disk drive 141 is 
typically connected to the system bus 121 through a 
non- removable memory interface such as interface 140, 
5 and magnetic disk drive 151 and optical disk drive 
155 are typically connected to the system bus 121 by 
a removable memory interface, such as interface 150. 

The drives and their associated computer 
storage media discussed above and illustrated in FIG. 

10 1, provide storage of computer readable instructions, 
data structures, program modules and other data for 
the computer 110. In FIG. 1, for example, hard disk 
drive 141 is illustrated as storing operating system 
144, application programs 145, other program modules 

15 146, and program data 14 7. Note that these 

components can either be the same as or different 
from operating system 134, application programs 135, 
other program modules 136, and program data 137. 
Operating system 144, application programs 145, other 

20 program modules 14 6, and program data 14 7 are given 
different numbers here to illustrate that, at a 
minimum, they are different copies. 

A user may enter commands and information 
into the computer 110 through input devices such as a 

25 keyboard 162, a microphone 163, and a pointing device 
161, such as a mouse, trackball or touch pad. Other 
input devices (not shown) may include a joystick, 
game pad, satellite dish, scanner, or the like. 
These and other input devices are often connected to 

30 the processing unit 120 through a user input 
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interface 160 that is coupled to the system bus, but 
may be connected by other interface and bus 
structures, such as a parallel port, game port or a 
universal serial bus (USB) . A monitor 191 or other 
5 type of display device is also connected to the 
system bus 121 via an interface, such as a video 
interface 190. In addition to the monitor, computers 
may also include other peripheral output devices such 
as speakers 197 and printer 196, which may be 

10 connected through an output peripheral interface 195. 

The computer 110 is operated in a networked 
environment using logical connections to one or more 
remote computers, such as a remote computer 180. The 
remote computer 180 may be a personal computer, a 

15 hand-held device, a server, a router, a network PC, a 
peer device or other common network node, and 
typically includes many or all of the elements 
described above relative to the computer 110. The 
logical connections depicted in FIG. 1 include a 

20 local area network (LAN) 171 and a wide area network 
(WAN) 173, but may also include other networks. Such 
networking environments are commonplace in offices, 
enterprise-wide computer networks, intranets and the 
Internet . 

25 When used in a LAN networking environment, 

the computer 110 is connected to the LAN 171 through 
a network interface or adapter 170. When used in a 
WAN networking environment, the computer 110 
typically includes a modem 172 or other means for 

30 establishing communications over the WAN 173, such as 
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the Internet. The modem 172, which may be internal 
or external, may be connected to the system bus 121 
via the user input interface 160, or other 
appropriate mechanism. In a networked environment, 
5 program modules depicted relative to the computer 
110, or portions thereof, may be stored in the remote 
memory storage device. By way of example, and not 
limitation, FIG. 1 illustrates remote application 
programs 185 as residing on remote computer 180. It 
10 will be appreciated that the network connections 
shown are exemplary and other means of establishing a 
communications link between the computers may be 
used. 

FIG. 2 is a block diagram of a mobile 
15 device 200, which is an exemplary computing 
environment. Mobile device 200 includes a 
microprocessor 202, memory 204, input /output (I/O) 
components 206, and a communication interface 208 for 
communicating with remote computers or other mobile 
2 0 devices. In one embodiment, the afore -mentioned 
components are coupled for communication with one 
another over a suitable bus 210. 

Memory 204 is implemented as non-volatile 
electronic memory such as random access memory (RAM) 
25 with a battery back-up module (not shown) such that 
information stored in memory 2 04 is not lost when the 
general power to mobile device 200 is shut down. A 
portion of memory 204 is preferably allocated as 
addressable memory for program execution, while 
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another portion of memory 204 is preferably used for 
storage, such as to simulate storage on a disk drive. 

Memory 2 04 includes an operating system 
212, application programs 214 as well as an object 
5 store 216. During operation, operating system 212 is 
preferably executed by processor 202 from memory 204. 
Operating system 212, in one preferred embodiment, is 
a WINDOWS® CE brand operating system commercially 
available from Microsoft Corporation. Operating 

10 system 212 is preferably designed for mobile devices, 
and implements database features that can be utilized 
by applications 214 through a set of exposed 
application programming interfaces and methods. The 
objects in object store 216 are maintained by 

15 applications 214 and operating system 212, at least 
partially in response to calls to the exposed 
application programming interfaces and methods. 

Communication interface 208 represents 
numerous devices and technologies that allow mobile 

20 device 200 to send and receive information. The 
devices include wired and wireless modems, satellite 
receivers and broadcast tuners to name a few. Mobile 
device 200 can also be directly connected to a 
computer to exchange data therewith. In such cases, 

25 communication interface 208 can be an infrared 
transceiver or a serial or parallel communication 
connection, all of which are capable of transmitting 
streaming information. 

Input/output components 2 06 include a 

30 variety of input devices such as a touch-sensitive 
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screen, buttons, rollers, and a microphone as well as 
a variety of output devices including an audio 
generator, a vibrating device, and a display. The 
devices listed above are by way of example and need 
5 not all be present on mobile device 200. In 
addition, other input/output devices may be attached 
to or found with mobile device 200 within the scope 
of the present invention. 

The present invention provides a method and 

10 apparatus for reconstructing a speech signal using 
high resolution speech vectors. FIG. 3 provides a 
block diagram of the system and FIG. 4 provides a 
flow diagram of the method of the present invention. 

At step 400, a noisy analog signal 300 is 

15 converted into a sequence of digital values that are 
grouped into frames by a frame constructor 302. 
Under one embodiment, the frames are constructed by 
applying analysis windows to the digital values where 
each analysis window is a 25 millisecond hamming 

20 window, and the centers of the windows are spaced 10 
milliseconds apart . 

At step 402, a frame of the digital speech 
signal is provided to a Fast Fourier Transform 3 04 to 
compute the phase and magnitude of a set of 

25 frequencies found in the frame. Under one 

embodiment, Fast Fourier Transform 3 04 produces noisy 
magnitudes 306 and phases 308 for 128 frequencies in 
each frame. The phases 308 for the frequencies are 
stored for later use. A log function 310 is applied 
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to magnitudes 306 at step 4 08 to compute the 
logarithm of each magnitude. 

At step 410, the logarithm of each 
magnitude is provided to a finite impulse response 
5 (FIR) filter 312, which filters each magnitude over 
time. Under one embodiment, the FIR filter uses 
three consecutive frames for filtering using filter 
parameters of (0.25 0.5 0.25). This smoothes the log 
magnitudes and reduces spurious errors. 

10 The filtered log magnitudes are provided as 

a vector of magnitude values to a posterior 
calculator 314, which computes a posterior 
probability for the vector at step 410. The 
posterior probability provides the probability of a 

15 clean speech log magnitude vector given the noisy 
speech log magnitude vector. Under one embodiment, a 
mixture model is used consisting of a mixture of 
different posterior components, each having a mean 
and variance. Under one specific embodiment, a 

20 mixture model consisting of 512 male speaker mixture 
components and 512 female speaker mixture components 
is used. One technique for computing the posterior 
probabilities is discussed further below in 
connection with FIG. 5. 

25 At step 414 the posterior probability is 

used to compute an estimate of the clean log 
magnitude spectrum using an estimator 316. Under one 
embodiment, the estimate of the clean log magnitude 
spectrum is a weighted average of the minimum mean 
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square error estimates calculated from each of the 
mixture components of the posterior probability. 

The estimated clean signal log magnitude 
values are exponentiated at step 416 by an exponent 
5 function 318 to produce estimates of the clean 
magnitudes 320. At step 418, an inverse Fast Fourier 
Transform 322 is applied to the clean magnitudes 320 
using the stored phases 308 taken from the noisy 
signal at .step 402 above. The inverse Fast Fourier 

10 Transform results in a frame of time domain digital 
values for the frame. 

At step 420 an overlap and add unit 326 is 
used to overlap and add the frames of digital values 
produced by the inverse Fast Fourier Transform to 

15 produce a clean digital signal 328. Under one 
embodiment, this is done using synthesis windows that 
are designed to provide perfect reconstruction when 
the analyzed signal is perfect and to reduce edge 
effects. Under one particular embodiment, when an 

20 analysis window of a(s) is used, the synthesis 
window, b(s) is defined as: 

b(s) = — y EQ.l 

5> 2 (*-*r) 

where r is the time period between the beginning of 
successive analysis windows and the summation is 
25 taken over the number of windows. 

The output clean digital signal 328 can 
then be written to output audio hardware so that it 
is perceptible to users or stored at step 422. 
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As shown above, the present invention does 
not apply a frequency-based transform to the noisy 
log-magnitude values before determining the posterior 
probability. A frequency-based transform is one in 
5 which the level of filtering applied to a frequency 
is based on the identity of the frequency or the 
magnitudes of the frequencies are scaled and combined 
to form fewer parameters. (Note that the FIR filter 
in FIG. 3 is a time-domain filter that filters across 

10 different frames in time. It does not filter based 
on the identity of the frequency but instead filters 
based on the value of the frequency component at 
different times.) In particular, the present 

invention does not apply a Mel-Scale transform as was 

15 conventionally done in the prior art. This results 
in a high resolution feature vector being applied to 
the posterior probability calculation. 

By retaining all of the frequencies in the 
feature vector, the present invention provides a 

20 better posterior calculation, and thus a better 
estimate for the clean speech frequencies. In 
addition, because the number of frequency bins has 
not been reduced, the reconstructed signal is more 
intelligible, since information was not lost through 

25 a Mel-Scale transform. 

A process for identifying the posterior 

probability jp(rtxc|j>) of noise channel distortion, c, 

and clean signal, x, given a noisy signal y, is shown 
in FIG. 5. The process of FIG. 5 begins at step 500 
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where the means and variances for the mixture 
components of a prior probability p(n,x,c), and an 
observation probability p(y|n,x,c) are determined. 

To generate the means and variances of the 
5 prior probability, the process of one embodiment of 
the present invention first generates a mixture of 
Gaussians that describes the distribution of a set of 
training noise feature vectors, a second mixture of 
Gaussians that describes a distribution of a set of 

10 training channel distortion feature vectors, and a 
third mixture of Gaussians that describes a 
distribution of a set of training clean signal 
feature vectors. The mixture components can be formed 
by grouping training feature vectors using a maximum 

15 likelihood training technique or by grouping training 
feature vectors that represent a temporal section of 
a signal together. Those skilled in the art will 
recognize that other techniques for grouping the 
feature vectors into mixture components may be used 

2 0 and that the two techniques listed above are only 
provided as examples. Under one embodiment, one 
mixture component is used for noise, one mixture 
component is used for channel distortion, and 128 
mixture components are used for clean speech. 

25 After the training feature vectors have 

been grouped into their respective mixture 
components, the mean and variance of the feature 
vectors within each component is determined. In an 
embodiment in which maximum likelihood training is 

30 used to group the feature vectors, the means and 
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variances are provided as by-products of grouping the 
feature vectors into the mixture components. 

After the means and variances have been 
determined for the mixture components of the noise 
feature vectors, clean signal feature vectors, and 
channel feature vectors, these mixture components are 
combined to form a mixture of Gaussians that 
describes the total prior probability. Using one 
technique, the mixture of Gaussians for the total 
prior probability will be formed at the intersection 
of the mixture components of the noise feature 
vectors, clean signal feature vectors, and channel 
distortion feature vectors. 

The variances of the mixture components of 
the observation probability are determined using a 
closed form expression of the form: 

V = VAR(y\x,n)= EQ.2 

cosh(<"-*X) 

where a is estimated from the training data. 

Under other embodiments, these variances 
are formed using a training clean signal, a training 
noise signal, and a set of training channel 
distortion vectors that represent the channel 
distortion that will be applied to the clean signal 
and noise signal. 

The training clean signal and the training 
noise signal are separately converted into sequences 
of feature vectors. These feature vectors, together 
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with the channel distortion feature vectors are then 
applied to an equation that approximates the 
relationship between observed noisy vectors and clean 
signal vectors, noise vectors, and channel distortion 
vectors. Under one embodiment, this equation is of 
the form: 

ymc + x + t}n(\ + eq _ 3 

where y is an observed noisy feature vector, c is a 
channel distortion feature vector, x is a clean 
signal feature vector, and n is a noise feature 
vector. In equation 3: 

ln(l + c (h - e, - jr,I) ' 

ln(l + e ([^])) = 



In 



\\ + e ([ " J - Cj - Xj n 



ln(l 



+ e 



([nj-cj-xj]) 



)\ 



25 



EQ. 4 

where n-j, Cj , and Xj are the jth elements in the noise 
feature vector, channel feature vector, and clean 
signal feature vector, respectively. 

Under one embodiment, the training clean 
signal feature vectors, training noise feature 
vectors, and channel distortion feature vectors used 
to determine the mixture components of the prior 
probability are reused in equation 3 to produce 
calculated noisy feature vectors. Thus, each mixture 
component of the prior probability produces its own 
set of calculated noisy feature vectors. 

The training clean signal is also allowed 
to pass through a training channel before being 
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combined with the training noise signal. The 
resulting analog signal is then converted into 
feature vectors to produce a sequence of observed 
noisy feature vectors. The observed noisy feature 
5 vectors are aligned with their respective calculated 
noisy feature vectors so that the observed values can 
be compared to the calculated values. 

For each mixture component in the prior 
probability, the average difference between the 

10 calculated noisy feature vectors associated with that 
mixture component and the observed noisy feature 
vectors is determined. This average value is used as 
the variance for the corresponding mixture component 
of the observation probability. Thus, the calculated 

15 noisy feature vector produced from the third mixture 
component of the prior probability would be used to 
produce a variance for the third mixture component of 
the observation probability. At the end of step 500, 
a variance has been calculated for each mixture 

2 0 component of the observation probability. 

After the parameters of the mixture 
components of the prior probability and the 
observation probability have been determined, the 
process of FIG. 5 continues at step 502 where the 
25 first mixture component of the prior probability and 
the observation probability is selected. 

Due to the non- linear relationship in 
Equation 3, the true posterior is non-Gaussian. 
However, under one embodiment of the invention, the 

3 0 posterior is approximated as a Gaussians. In order 



-20- 

to make this approximation, a linear approximation of 
Equation 3 must be made. This is done using a first 
order Taylor series expansion of: 

y = g(z 0 ) + g'(z 0 )(z~z 0 ) EQ. 5 

5 where z and z Q are stacked vectors representing a 
combination of a noise vector, channel vector and 
clean signal vector such that 

z = [x T n T c T ] EQ. 6 

*.=[*I*Icl] EQ. 7 

10 and where 

g(zJ = x 0 +c 0 +ln(l + e K ^- vJ ) EQ. 8 

and g'(z 0 ) is the derivative of g(z 0 ) determined at 

expansion point z Q . 

Using the Taylor series expansion, the 
15 variance and mean and variance of the posterior 
probability can be calculated iteratively using: 

2 = ^ + ^" , t-0+^N''t-«U)) EQ . 9 
0 = ^- 1 +g\T Jp ) T W- l gXTJ p )r l EQ. 10 

where 7j is the newly calculated mean for the 
20 posterior probability of the current mixture, rj p is 
the mean for the posterior probability determined in 
a previous iteration, Z" 1 is the inverse of the 
covariance matrix for this mixture component of the 
prior probability, /u is the mean for this mixture 

25 component of the prior probability, Y is the variance 
of this mixture component of the observation 
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probability, O is the variance of the posterior 
probability for this mixture component, gif/p) is the 

right-hand side of equation 8 evaluated with the 
expansion point set equal to the mean of the previous 
iteration, giftp) is the matrix derivative of equation 

8 calculated at the mean of the previous iteration, 
and y is the observed feature vector. 

In equation 9, ju , rj and ?j p are M-by-1 

matrices where M is three times the number of 
elements in each feature vector. In particular, ju , 77 

and 7] p are described by vectors having the form: 



— Elements For Clean Signal Feature Vector 
M 

— Elements For Noise Feature Vector 
3 

M 

— Elements For Channel Distortion Feature Vector 
3 



EQ. 11 



15 Using this definition for fu , rj and r/ p , and 

using t] p as the expansion point z Q , Equation 8 above 
can be described as : 
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EQ. 12 
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where the designations in equation 12 indicate the 
spans of rows which form the feature vectors for 
those elements. 

In equations 9 and 10, the derivative g'iflp) 

M 

is a matrix of order by-M where the element of row 

3 

i, column j is defined as: 



d g 




d 


T 1 P 


j 



EQ. 13 

where the expression on the right side of equation 13 
is a partial derivative of the equation that 
10 describes the ith element of gip p ) relative to the jth 

element of the 7j p matrix. Thus, if the jth element 

of the 7j p matrix is the fifth element of the noise 

feature vector, n 5 , the partial derivative will be 
taken relative to n 5 . 

15 The iterative process for determining the 

means and variance of the posterior probability is 
shown in steps 504, 506, 508, 510 and 512 of FIG. 5. 
At step 504, the expansion point z Q is set equal to 
the mean of the prior probability model. Thus, for 

20 the first iteration, rj p =ju. At step 506, equation 10 

is used to determine the variance <E> . At step 508, 
the variance is used in equation 9 to update the mean 
of the posterior probability. After the mean and 
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variance have been updated, the process determines if 
more iterations should be performed at step 510. 

If more iterations are to be performed, the 
current mean 77 is set as the past mean r] p at step 512 

5 so that the current mean is used as the expansion 
point in the next iteration. The process then 
returns to step 506. Steps 506,. 508, 510 and 512 are 
then repeated until the desired number of iterations 
has been performed. 

10 After the mean and variance for the first 

mixture component of the posterior probability has 
been determined, the process of FIG. 5 continues by- 
determining whether there are more mixture components 
at step 514. If there are more mixture components, 

15 the next mixture component is selected at step 516 
and steps 504, 506, 508, 510 and 512 are repeated for 
the new mixture component . 

Once a . mean and variance has been 
determined for each mixture component of the 

20 posterior probability, the process of FIG. 5 
continues at step 514 where the mixture components 
are combined to identify a most likely clean signal 
feature vector given the observed noisy signal 
feature vector. Under one embodiment, the clean 

25 signal feature vector is calculated as: 




EQ. 14 
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where S is the number of mixture components, p s is 



the weight for mixture component s, rj s 



M 



1:— 

V 3y 



is the 



feature vector for the mean of the posterior 
probability of the clean signal, and x post is the 
weighted average value of the clean signal feature 
vector given the observed noisy feature vector. 

The weight for each mixture component, p s 
is calculated as: 



7t s e G > 
Ps=^ — 

Ia 

10 w EQ. 15 

where the dominator of equation 15 normalizes the 
weights by dividing each weight by the sum of all 
other weights for the mixture components. In 
equation 15, 7t s is a weight associated with the 

15 mixture components of the prior probability and is 
determined as: 

EQ . 16 

where n* , n" , and n c s are mixture component weights for 

2 0 the prior clean signal, prior noise, and prior 
channel distortion, respectively. These weights are 
determined as part of the calculation of the mean and 
variance for the prior probability. 

In equation 15, G s is a function that 
25 affects the weighting of a mixture component based on 



the shape of the prior probability and posterior 
probability, as well as the similarity between the 
selected mean for the posterior probability and the 
observed noisy vector and the similarity between the 
selected mean and the mean of the prior probability. 

Under one embodiment, the expression for G s is: 



G = 



Iln 

2 



2tt£ 



+ —In I 2n O v 
2 1 



EQ. 17 



where ln|2^E 4 . | involves taking the natural log of the 

determinant of In times the covariance of the prior 

probability, ln^flO.J involves taking the natural log 

of the determinant of 2n times the covariance matrix 
of the posterior probability. 

In other embodiments, the clean signal 
vector is estimated as: 

x p os, = 2 P J xp(x | y)dx EQ. 18 

s 

Those skilled in the art will recognize 
that there are other ways of using the mixture 
approximation to the posterior to obtain statistics. 



For example, the means of the mixture component with 
largest p can be selected. Or, the entire mixture 
distribution can be used as input to a recognizer. 

Although a particular method for 
determining the posterior probability is discussed 
above, those skilled in the art will recognize that 
any technique for identifying the posterior 
probability may be used with the present invention. 

Although the present invention has been 
described with reference to particular embodiments, 
workers skilled in the art will recognize that 
changes may be made in form and detail without 
departing from the spirit and scope of the invention. 



